Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency
نویسندگان
چکیده
Sentiment analysis can detect hate speech using the Natural Language Processing (NLP) concept. This process requires annotation of text in labeling. However, when carried out by people, this must use experts field speech, so there is no subjectivity. In addition, if processed humans, it will take a long time and allow errors for extensive data. To solve problem, we propose an automatic with concept semi-supervised learning K-Nearest Neighbor algorithm. feature extraction term frequency-inverse document frequency (TF-IDF) to obtain optimal results. KNN TF-IDF were able annotate increase accuracy < 2% from initial iteration 57.25% 59.68% detecting speech. dataset 13169 distribution 80:20 training testing There are 2370 labeled datasets; testing, 1317 unannotated data; after preprocessing, 9482. The final results processes have length 11235 annotated
منابع مشابه
Text Clusters Labeling using WordNet and Term Frequency- Inverse Document Frequency
Cluster Labeling is the process of assigning appropriate and well descriptive titles to text documents. The most suitable label not only explains the central theme of a particular cluster but also provides a means to differentiate it from other clusters in an efficient way. In this paper we proposed a technique for cluster labeling which assigns a generic label to a cluster that may or may not ...
متن کاملSentiTFIDF – Sentiment Classification using Relative Term Frequency Inverse Document Frequency
Sentiment Classification refers to the computational techniques for classifying whether the sentiments of text are positive or negative. Statistical Techniques based on Term Presence and Term Frequency, using Support Vector Machine are popularly used for Sentiment Classification. This paper presents an approach for classifying a term as positive or negative based on its proportional frequency c...
متن کاملK-Nearest Neighbors Relevance Annotation Model for Distance Education
With the rapid development of Internet technologies, distance education has become a popular educational mode. In this paper, the authors propose an online image automatic annotation distance education system, which could effectively help children learn interrelations between image content and corresponding keywords. Image automatic annotation is a significant problem in image retrieval and ima...
متن کاملInverse-Category-Frequency based Supervised Term Weighting Schemes for Text Categorization
Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval (IR) field. The intuition behind idf for text categorization seems less reasonable than IR. In this paper, we introduce inverse category frequency (icf) int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Computer Science and Applications
سال: 2022
ISSN: ['2158-107X', '2156-5570']
DOI: https://doi.org/10.14569/ijacsa.2022.0131020